Skip to content

WebAssembly Branch Hinting? #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 61 commits into from
Closed

WebAssembly Branch Hinting? #1

wants to merge 61 commits into from

Conversation

kripken
Copy link
Owner

@kripken kripken commented Jun 11, 2025

@dschuff @sbc100 This is an early beginning of an attempt to support branch hints in LLVM, but I honestly do not really know what I am doing 😄 Maybe you can tell me if I am even in the right direction.

Background: This is an example of an LLVM branch weights hint:

define i32 @branch_weights_test(i32 %a, i32 %b) {
  %1 = icmp ult i32 %a, %b
  br i1 %1, label %fail, label %success, !prof !0

fail:
  ret i32 -1

success:
  ret i32 0
}

!0 = !{!"branch_weights", i32 42, i32 1337}

That br has 42/(42+1337) (estimated) chance to go to the first target.

Reading stuff in the wasm backend and lld, it seems like there is a generic mechanism for generating custom sections in LLVM IR, compiling them, and linking them, so I was wondering if appending something like this could work:

!wasm.custom_sections = !{!1}
!1 = !{!"metadata.code.branch_hint", !"\01\02\03\FF"} ;; right section name, nonsense bytes

It looks like llc emits a custom section from that. So "all" I need is to replace the nonsense bytes with the proper info.

The proper info should somehow refer to the instruction. I am guessing a relocation is needed..? We'd need one both for the function and the offset of the instruction in the function. Is that possible, or should I be doing something entirely different?

(The code in this PR adds an IR pass to the wasm backend, and successfully reads branch_weights. It does not emit anything yet.)

@dschuff
Copy link

dschuff commented Jun 11, 2025

The proper info should somehow refer to the instruction. I am guessing a relocation is needed..? We'd need one both for the function and the offset of the instruction in the function. Is that possible, or should I be doing something entirely different?

I think that's possible in principle. In assembler terms you can have a label attached to the function start and to each instruction, and you can express the difference between them as an assembler expression. Actually I think that means that you don't even need a relocation, the label difference can be calculated at assembly time.

If we wanted to lower the branch weight metadata into wasm custom section metadata at the LLVM level, then we'd need to make the section payload be something other than an LLVM string constant; it would have to be something that symbolically links to the function or instruction. I guess this must be possible somehow, as debug info does something like that; but it might be a special case. But we'd need something that would survive backend passes reordering the blocks and such.

It might make more sense to do something at IR lowering time instead, and attach the data somehow to the MachineInstr representing the branch. Actually come to think of it, other targets must do something like this too, right? Since the passes that use the branch weights would be backend passes like block placement. Maybe there's some common code we can use for that, that we can use at the end of the backend pipeline when we go from MachineInstr IR to MC.

@sbc100
Copy link

sbc100 commented Jun 11, 2025

This does sounds like something that would need to a relocation, at least for the function index.

As derek said the offset from the begining of the function can likely be calculated by the assembly so should not need a relocation (i.e. the linker cannot change this since it never touches the bytes of a function).

The current relocation we have are listed in llvm/include/llvm/BinaryFormat/WasmRelocs.def. What is the encoding that you need for the function index? We currently have R_WASM_FUNCTION_INDEX_LEB and R_WASM_FUNCTION_INDEX_I32 relocations types.

@kripken
Copy link
Owner Author

kripken commented Jun 11, 2025

@dschuff

It might make more sense to do something at IR lowering time instead, and attach the data somehow to the MachineInstr representing the branch. Actually come to think of it, other targets must do something like this too, right? Since the passes that use the branch weights would be backend passes like block placement.

Makes sense. I got stuck finding where the IR metadata gets converted into the machine layer, but I did find how things work there: Machine*Passes use MachineBranchProbabilityInfoWrapperPass which provides MBPI->getEdgeProbability(source, target). I verified that works correctly in the last push to this branch: now we find br_if|unless MachineInstrs and print their probabilities.

@sbc100

What is the encoding that you need for the function index?

For the IR option here, we'd need the usual function index LEB, which I am guessing R_WASM_FUNCTION_INDEX_LEB is? The trickier part would be emitting the instruction offset relative to the function start, as @dschuff said, probably debug info does something like that, but I'm not sure how practical that is in general.

Overall it seems like we could do this either

  • At the IR level, emitting custom section metadata, if we can emit a relocatable offset of an instruction relative to the function start. Getting the weight inputs is in the previous state of the PR.
  • At the Machine level, if we can emit a custom section from there (if not maybe we'd need to shuttle the information around to the proper place?). Getting the weight inputs is in the current state of the PR.

I'll look into this some more, but any advice between these two very different paths would be welcome, given I'm not familiar with LLVM much.

@dschuff
Copy link

dschuff commented Jun 11, 2025

I think the latter would be most straightforward. If you look in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp you can see where we create custom sections for function attributes (EmitFunctionAttributes) and for the implementation of !wasm.custom_sections (in emitEndOfAsmFile). In this file you can use the MC API (which corresponds more or less directly to the features available in assembers), so to create a custom section you can basically switch to your custom section whenever you want (even during emission of the code section). So for example, during emitFunctionBodyStart, you can emit the relocatable LEB for the function index (and the header for the vector of metadata entries? I guess that would be easier if we had some way to know how many there are before emitting the function); then during emitInstruction you can emit an MCExpr for the offset.

You'll end up with asm code that switches back and forth between the code and metadata sections, emitting bytes incrementally to the latter (just as each instruction emits incrementally to the code section). The assembler will combine these into one section in the object file, and then the linker will combine them into one section in the output. We just need to figure out how get the header correct (i.e. the length of the vector of functions)?

The other option (instead of incrementally emitting the custom section during instruction emission) could be to emit the whole function's worth of the section at once at the end of the function (or even at the end of the object file). That could work if we can't know ahead of time how many annotations each function has, but I think we'd still need to make sure we emit labels for each branch (during instruction emission) to calculate the offsets, and we'd still need to figure out how to get things right after linking.

@kripken
Copy link
Owner Author

kripken commented Jun 12, 2025

Thanks, I think I wrapped my head around some AsmPrinter basics, and got it to emit an empty branch hint section... next, the real 99%

@kripken
Copy link
Owner Author

kripken commented Jun 20, 2025

Btw, the error you were getting before about the expression not being absolute was emitting the function symbol, not the label difference, right?

Yes, at some point I noticed that diff ended up as absolute, I guess since the two symbols are in the same section. So all the problem here is really just about the relocatable function symbol index.

if (Value->evaluateAsAbsolute(IntValue, getAssemblerPtr())) {
emitULEB128IntValue(IntValue);
return;
}
insert(getContext().allocFragment<MCLEBFragment>(*Value, false));

if (!PadTo) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be even more specific here, the thing I was hoping for was that we could tell, based on some property of Value, whether there needs to be a relocation. If so, we can basically just assume that the size will be 5 (The only non-32-bit relocation I know of would be for memory addresses, so there could be i64.const instruction in code with 9-byte relocatable LEBs, but I guess those are handled differently elsewhere). And then we don't need the explicit PadTo in the streamer API or the separate assembler directive.
The question of course, is whether that's possible. MCExpr has evaluateAsRelocatable alongside evaluateAsAbsolute but currently I'm not totally sure what it means.

@@ -609,6 +654,65 @@ void WebAssemblyAsmPrinter::emitJumpTableInfo() {
// Nothing to do; jump tables are incorporated into the instruction stream.
}

std::optional<bool> WebAssemblyAsmPrinter::getBranchHint(const MachineInstr& MI) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aheejin This is the code I mentioned in the meeting, PTAL if you have time. Maybe there is a better way to do this?

I added some comments here and in the calling code that hopefully explain why it works as it does. At a high level:

  • During emitInstruction we see if we need a hint on a br_if. We do it at that time, so that we can emit a Label if we want a hint (we need the label to refer to the br_if from the custom section). That is after CFGStackify. At this stage we just emit the Labels and stash the information.
  • After emitting all functions, we gather the stashed information, and emit the final section.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a better way if we do this after CFGStackify, and this is roughly what I'd do as well. But this code may not handle try-catch and exceptions.

When EH is enabled, one of a BB's successor can be an unwind destination, so in theory a BB can have three successors. I don't think we generate that kind of code, but I think it is safe to check it and exclude it. (by calling MachineBasicBlock::isEHPad() or something)

And with try-catch the fallthrough block may not be the next block. For example:

try
  ...
  br_if ...
catch            ;; br_if falls through to NOT here
  ...
end
                 ;; br_if falls through to here

In this case both the true and false destinations will be successors but it may be tricky to figure out which one is which... To figure this out we may need to keep track of the control flow stack so we can compute the (true) destination of the branch? Which sounds like an overkill. This kind of fallthrough is generated by this optimization which may not be critical, meaning if someone wants performance they would've run binaryen anyway, so if this becomes a problem we can consider removing it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, what does an unwind target from a BB that ends in a br_if mean? That there is a call in the middle of the BB, and it threw? (Do such calls not need to be terminators of BBs? Sorry, I don't know much about wasm backend control flow...)

Copy link

@aheejin aheejin Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, I think there can't really be a case with three successors (two branch successors and an unwind successor). Because, so when a BB has a call that can throw and has an unwind BB successor, before CFGStackify it would look like

...
bb:
  ;; successor: ehpad, cont
  call $maythrow
  br $cont

ehpad:
  catch ...

cont:
  ...

And we remove that br $cont in CFGStackify because it is not necessary. In this case there's no chance that bb will end with br_if. Also this is not feasible:

bb:
  ;; successor: ehpad, somewhere, cont
  call $maythrow
  br_if $somewhere
  br $cont

ehpad:

Because a BB with a throwing call (and has an unwind successor) wouldn't have a br_if after the throwing call.

What I can't guarantee we can't have is the case of try block's last BB that doesn't have an unwind dest BB. Because we can wrap many throwing calls within a big try-catch, the last BB within a try part may not be a BB with an unwind destination. For example:

bb0:
  ;; successor: ehpad, cont
  call $maythrow
  br $cont

bb1:
  ;; successor: ehpad, cont
  call $maythrow
  br $cont

bb2:
  ;; successor: somewhere, cont
  ... something else ...
  br_if $somewhere
  br $cont

ehpad:
  ...

cont:

In this case, if br $somewhere_else is removed, we don't have the info on bb2's fallthrough block. This situation is pretty contrived and it may not happen in reality. I just can't guarantee it won't. But given that this branch hint is afer all a 'hint', we may not need to consider this case. When seeing a br_if, just checking if the fallthrough (= next) block is an EH pad, and if so, not annotating that branch can be sufficient.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About

In this case, if br $somewhere_else is removed

Was this a typo of br $cont, or br_if $somewhere?

Copy link
Owner Author

@kripken kripken Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware about terminator lists until you wrote this comment... interesting.

PTAL at the commit where I just tried to handle the corner cases you mention.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About

In this case, if br $somewhere_else is removed

Was this a typo of br $cont, or br_if $somewhere?

Sorry, br $cont

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments below. Please ignore my first comment (which I've now deleted); I was confused and the comment was incorrect.

Comment on lines 700 to 701
// the parent block of this BR_IF. (If there are three successors, due to some
// terminator before us, give up on this hint.)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about due to an additional unwind successor or ... EH pad successor?

Comment on lines 717 to 718
// It must be one of the successors.
assert(FalseDest == MBB1 || FalseDest == MBB2);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion does not hold if we fall through to an EH pad

Comment on lines 719 to 734
// Stop if the fallthrough is an EH pad, because that could happen like this:
//
// bb1:
// ;; successor: if.true, otherwise
// ..
// br_if $if.true
// br $otherwise
//
// ehpad:
//
// If that br $otherwise is optimized out, it would look like we fall through
// to the ehpad (since it is the block physically after us), but in wasm's
// structured control flow that is not the case. Rather than try to find the
// true fallthrough, give up on a hint in this rare case.
if (FalseDest->isEHPad())
return {};
Copy link

@aheejin aheejin Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can have another corner case too... For example

bb0:
  ...
  br_if

bb1:
  delegate

ehpad:
  catch

So this delegate here would have been inserted after we first place try-catch markers to fix unwind mismatches. We add delegate in a separate BB because it is also a terminator. In this case the fallthrough BB is not an EH pad but we still shouldn't fall through to the next BB.

I think a safer solution could be... Just bail out when one of the ParentBB's successor is not the next BB in the layout order. I think this can handle both cases.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that is much simpler and makes perfect sense. The last commit includes that and other fixes from your comments.

Copy link

@aheejin aheejin Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this part looks good to me now (I haven't yet looked at other parts)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't look carefully at the other parts yet 😆 There are some bad hacks, and I am still debugging the case of multiple files...

@aheejin
Copy link

aheejin commented Jun 30, 2025

I haven't checked the code yet, but this PR was uploaded yesterday: llvm#146230

@kripken
Copy link
Owner Author

kripken commented Jun 30, 2025

Thanks! Good to know...

I'll take a closer look Monday, but looks like that PR is in better shape than mine, none of the horrible hacks at least... so I imagine the LLVM side of my work won't be needed. Funny timing, I just got multi-file hinting working Friday and the first full benchmarks up and running 😆

@kripken
Copy link
Owner Author

kripken commented Jun 30, 2025

That other PR definitely looks like the way to go in LLVM, compared to this, so I'll close this one...

Last update on my work in this area, the code here is hackish but fully functional, though it does need --no-gc-sections during link (to avoid emitting a branch hint referring to a function that was dce'd). With that, I could run real-world benchmarks. I found a few projects that looked relevant in the emscripten benchmark suite:

  • SQLite annotates likely() conditions, however, those are meant for the reader, not the optimizer.
  • Bullet has likely() conditions, but actually only in the SPU backend (Cell processor from the PS3 back in the day, which had no branch predictor, so it needed manual annotations of likelihood...)
  • Poppler. Has likely annotations in good places, and the final binary has 1,000 of them. However, the benchmark is no faster.

Somewhat disappointing, but I did confirm that making an artificial benchmark with a 30% benefit is not difficult: https://gist.github.com/kripken/09ad895764cd6cd3b7a8a6c13c643c93

I'll wait for that other PR to land in LLVM, and meanwhile am working on wasm-opt optimizer support for branch hints (properly updating the hint when we flip an if, for example).

@kripken kripken closed this Jun 30, 2025
@Lukasdoe
Copy link

Lukasdoe commented Jul 1, 2025

Branch hint support (or support for code metadata in general) in wasm-opt would be great! Currently, I have to hard-disable wasm-opt and meta-dce in emscripten since (as you said) it invalidates branch hints.

I'll respond to your other points about the performance under the other PR. Sorry for noticing this PR here so late, it didn't appear in any searches :/

@kripken
Copy link
Owner Author

kripken commented Jul 1, 2025

Sorry for noticing this PR here so late, it didn't appear in any searches :/

No worries, maybe I should have opened it on LLVM, but it began as more of a question/experiment...

Anyhow, what matters is we get good support in LLVM, and your PR looks right for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants